Skip to content

[Json] Use partition and take in RunEndEncoded decoder#9658

Open
liamzwbao wants to merge 5 commits intoapache:mainfrom
liamzwbao:issue-9645-ree-optimization
Open

[Json] Use partition and take in RunEndEncoded decoder#9658
liamzwbao wants to merge 5 commits intoapache:mainfrom
liamzwbao:issue-9645-ree-optimization

Conversation

@liamzwbao
Copy link
Copy Markdown
Contributor

@liamzwbao liamzwbao commented Apr 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Optimize RunEndEncoded decoder to use partition and take, substantially improving performance (over 2x speedup).

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 3, 2026
@liamzwbao liamzwbao force-pushed the issue-9645-ree-optimization branch from 568cceb to ffd5849 Compare April 3, 2026 21:46
@liamzwbao liamzwbao marked this pull request as ready for review April 3, 2026 21:49
@liamzwbao
Copy link
Copy Markdown
Contributor Author

Local benchmark results

Benchmarking decode_short_ree_runs_json/131072: Collecting 50 samples in estimated 5.1127 s (650 ite
decode_short_ree_runs_json/131072
        time:   [7.7892 ms 7.8779 ms 7.9584 ms]
        thrpt:  [217.23 MiB/s 219.45 MiB/s 221.95 MiB/s]
 change:
        time:   [−49.865% −49.220% −48.647%] (p = 0.00 < 0.05)
        thrpt:  [+94.730% +96.929% +99.463%]
        Performance has improved.

Benchmarking decode_long_ree_runs_json/131072: Collecting 50 samples in estimated 5.1862 s (700 iter
decode_long_ree_runs_json/131072
        time:   [6.5677 ms 6.6715 ms 6.7831 ms]
        thrpt:  [223.96 MiB/s 227.71 MiB/s 231.31 MiB/s]
 change:
        time:   [−56.851% −56.093% −55.362%] (p = 0.00 < 0.05)
        thrpt:  [+124.03% +127.75% +131.75%]
        Performance has improved.

Benchmarking decode_short_ree_runs_serialize: Collecting 100 samples in estimated 5.8899 s (600 iter
decode_short_ree_runs_serialize
        time:   [8.9996 ms 9.0555 ms 9.1159 ms]
        change: [−49.254% −48.848% −48.441%] (p = 0.00 < 0.05)
        Performance has improved.

Benchmarking decode_long_ree_runs_serialize: Collecting 100 samples in estimated 5.2928 s (600 itera
decode_long_ree_runs_serialize
        time:   [8.4019 ms 8.4652 ms 8.5338 ms]
        change: [−52.747% −52.330% −51.944%] (p = 0.00 < 0.05)
        Performance has improved.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @liamzwbao

Comment thread arrow-json/Cargo.toml
arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-data = { workspace = true }
arrow-ord = { workspace = true }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are fairly non trivial crates (ord and select) so it is sad to see the dependencies being added here

That being said, I think it is becoming clear that anything involving REE benefits from those two

Maybe we could split them up or osmething into new crates with the "core" parts (specifically partition and take) 🤔

arrow-take and arrow-partition maybe 🤔

I'll file a ticket to consider this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let values_data = mutable.freeze();
let run_ends_data =
PrimitiveArray::<R>::new(ScalarBuffer::from(run_ends), None).into_data();
let indices = UInt32Array::from_iter_values(indices.into_iter().map(|i| i as u32));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the old code could also handle usize indices not ust u32 but I think in practice it won't matter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also indices are bounded by the tape pos (which is &[u32]), so they should never exceed u32::MAX IIUC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Json] RunEndEncoded decoder optimization

2 participants